Conversation
…rid reward strategies
|
|
||
| import numpy as np | ||
|
|
||
| _GAP_FLOOR = 1e-8 # BBOB precision target: gaps below this count as "solved". |
There was a problem hiding this comment.
Nice cap. It may interfer with AOCC computation. Double check that fitness isn't clipped there twice
|
|
||
|
|
||
| def reward_binary(new_best_y, old_best_y, initial_range, is_final=False): | ||
| def reward_binary(new_best_y, old_best_y, initial_range, is_final=False, optimum=None): |
There was a problem hiding this comment.
It would be nice to have Unit tests for all those reward definitions
|
|
||
| def reward_log_scaled(new_best_y, old_best_y, initial_range, is_final=False, optimum=None): | ||
| """Log-scaled incremental improvement (original r1).""" | ||
| if old_best_y == float("inf"): |
There was a problem hiding this comment.
logarithm of the first reward was added in order to avoid reward hacking. Generally in the case of the rewards that do not take global optimum into account, it's hard for the reward not to get hacked. I think It's also important to keep in mind, that inserting global minimum into reward is making meta-bbo task significantly easier. It would be nice to compare global-optimum-aware rewards to each other, but not necessarily to the ones that do not take GO into account.
No description provided.